Learning in communication systems

ABSTRACT

Apparatuses, methods and computer programs are described including receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system includes a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver based on the modified sequence of messages; and training at least some weights of the correction algorithm based on the received reward or loss function.

BACKGROUND

A simple communications system includes a transmitter, a transmission channel, and a receiver. In some implementations, the transmitter-receiver pair may not achieve the best possible performance. There remains a need for improving the performance of such systems.

SUMMARY

In a first aspect, this specification describes an apparatus comprising: means for receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; means for converting the received sequence of messages into a converted sequence of messages using the correction algorithm; means for receiving a reward or loss function from the receiver; and means for training at least some weights of the correction algorithm based on the received reward or loss function. In some embodiments, there may be provided means for generating the reward or loss function.

Some embodiments include: means for modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and means for providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages. The said means for modifying the converted sequence of messages may make use of a distribution to generate the perturbations. The perturbations may be zero-mean Gaussian perturbations.

The reward or loss function may be related to one or more of block error rate, bit error rate, error vector magnitude, mean square error in estimation and categorical cross-entropy.

In some embodiment, there may be provided means for repeating the training of the at least some weights of the correction algorithm until a first condition (such as a defined number of iteration and/or a defined performance level) is reached.

The means for training may comprise optimising one or more of a batch size of the sequence of messages, a learning rate, and a distribution of perturbations.

The means for training at least some weights of the correction algorithm may comprise using reinforcement learning or stochastic gradient descent.

In a second aspect, this specification describes an apparatus comprising: means for obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; means for receiving the transmitted sequence of messages at the correction module; means for converting the received sequence of messages into a converted sequence of messages using the correction algorithm; means for generating a reward or loss function at the receiver; and means for training at least some weights of the correction algorithm based on the reward or loss function.

Some embodiments include: means for modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and means for providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages. The means for modifying the converted sequence of messages may make use of a distribution to generate the perturbations. The perturbations may be zero-mean Gaussian perturbations.

The reward or loss function may be related to one or more of block error rate, bit error rate, error vector magnitude, mean square error in estimation and categorical cross-entropy.

In some embodiment, there may be provided means for repeating the training of the at least some weights of the correction algorithm until a first condition (such as a defined number of iteration and/or a defined performance level) is reached.

The means for training may comprise optimising one or more of a batch size of the sequence of messages, a learning rate, and a distribution of perturbations.

The means for training at least some weights of the correction algorithm may comprise using reinforcement learning or stochastic gradient descent.

In either the first or the second aspect, the said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the performance of the apparatus.

In a third aspect, this specification describes a method comprising: receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function. The method may further comprise: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages. The method may make use of a distribution to generate the perturbations.

In a fourth aspect, this specification describes a method comprising: obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function. The method may further comprise: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages. The method may make use of a distribution to generate the perturbations.

In a fifth aspect, this specification describes an apparatus configured to perform any method as described with reference to the third or fourth aspect.

In a sixth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the third or fourth aspect.

In a seventh aspect, this specification describes a computer program comprising instructions stored thereon for performing at least the following: receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function. The computer program may further comprise instructions stored thereon for performing at least the following: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.

In an eighth aspect, this specification describes a computer program comprising instructions stored thereon for performing at least the following: obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function. The computer program may further comprise instructions stored thereon for performing at least the following: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.

In a ninth aspect, this specification describes a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following: receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function.

In a tenth aspect, this specification describes a non-transitory computer-readable medium comprising program instructions stored thereon for performing at least the following: obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments will now be described, by way of non-limiting examples, with reference to the following schematic drawings, in which:

FIG. 1 is a block diagram of a communication system in which example embodiments may be implemented;

FIG. 2 is a block diagram of an example end-to-end communication system in accordance with an example embodiment;

FIG. 3 is a module that may be used in the example communication system of FIG. 2;

FIG. 4 is a block diagram of an example end-to-end communication system in accordance with an example embodiment;

FIG. 5 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 6 is a block diagram of an example end-to-end communication system in accordance with an example embodiment;

FIG. 7 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 8 is a block diagram of a system in accordance with an example embodiment; and

FIGS. 9a and 9b show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.

DETAILED DESCRIPTION

FIG. 1 is a block diagram of a communication system, indicated generally by the reference numeral 1, in which example embodiments may be implemented. The system 1 includes a transmitter 2, a channel 4 and a receiver 6. As shown in FIG. 1, a transmission application 8 of the system 1 provides an input symbol (s) (also called a message) to the transmitter 2. The symbol/message (s) is transmitted to the receiver 6 via the channel 4. An output symbol GO is then provided to a receiver application 10 of the system 1.

The transmitter 2 seeks to communicate one out of M possible messages sϵ

={1, 2, . . . , M}

to the receiver 6. To this end, the transmitter 2 sends a complex-valued vector representation x=x(s)ϵ

^(n) of the message through the channel 4. Generally, the transmitter hardware imposes constraints on x, e.g., an energy constraint ∥x∥₂ ²≥n, an amplitude constraint |x_(i)|≤1∀i, or an average power constraint

[|x_(i)|²]≤1 ∀i. The channel is described by the conditional probability density function (pdf) p(y|x), where yϵ

^(n) denotes the received signal. Upon reception of y, the receiver produces the estimate s of the transmitted message 6.

The transmitter 2, channel 4 and receiver 6 may take many different forms. For example, the transmitter 2 may include a module (such as a neural network) for implementing a transmitter algorithm and the receiver 6 may include a module (such as a neural network) for implementing a receiver algorithm. The transmitter and receiver modules may be trained in order to optimise the performance of the system as a whole accordingly to some metric. However, this is not essential to all embodiments. Indeed, in some embodiments, the existence or details of such modules may be unknown.

In many cases, the transmitter/receiver pair does not achieve the best possible performance. This may, for example, be because the transmitter/receiver pair are designed to suit a wide variety of applications and channel conditions.

FIG. 2 is a block diagram of an example end-to-end communication system, indicated generally by the reference numeral 20, in accordance with an example embodiment. The system 20 includes the transmitter 2, channel 4 and transmitter application 8 of the system 1 described above. The system 20 also includes a receiver 24 and receiver application 26 similar to the receiver 6 and receiver application 10 described above. Further, the system 10 includes a received signal pre-processor (RSP) module 22.

As described above with reference to FIG. 1, the transmitter 2 seeks to communicate one out of M possible messages sϵ

={1, 2, . . . , M}

to the receiver 24. To this end, the transmitter 2 sends a complex-valued vector representation x=x(s)ϵ

^(n) of the message through the channel 4.

The output of the channel 4 (the vector y) is provided to the input of the receiver signal pre-processor module 22. The module 22 is a correction unit whose objective is to increase the performance of the communication system 20. The module 22 modifies the signal y to provide an output y_(p) that is provided to the receiver 24. The receiver generates an output symbol GO that is provided to a receiver application 26 of the system 20.

As shown in FIG. 2, the receiver application provides a reward signal r to the receiver signal pre-processor module 22. As described in detail below, the performance of the module 22 is adjusted to maximise the reward r (thereby maximising the performance of the system 20), for example by reinforcement learning. It should be noted that the module 22 does not necessarily require any knowledge of a system model and can merely maximum the reward received from the receiver application 26.

FIG. 3 is a block diagram of an example implementation of the receiver signal pre-processor module 22 described above with reference to FIG. 2. In the example implementation shown in FIG. 3, the module 22 is implemented using a deep feedforward neural network (NN) comprising a number of dense layers (a first layer 32 and an lth layer 34 are shown in FIG. 3 by way of example only).

The receiver signal pre-processor module 22 defines the mapping: RSP:

^(n)

^(n). In other words, the module 22 maps an n-dimensional complex-valued vector y that forms the receiver channel symbols (received from the channel 4 in the example system 20) to pre-processed channel symbols y_(p) from the same set.

As shown in FIG. 3, the module 22 may be deep feedforward neural network. However, other implementations, including other neural network implementations, are possible.

It is also possible that the module 22 uses a longer channel output vector {tilde over (y)}ϵ

^(n) with ñ>n as input, resulting from the reception of multiple subsequent messages to produce the pre-processed vector y_(p).

FIG. 4 is a block diagram of an example end-to-end communication system, indicated generally by the reference numeral 40, that may be used for training an example signal pre-processor module.

The system 40 includes the transmitter 2, channel 4 and transmitter application 8 described above. The system 40 also includes a signal pre-processor module 42, receiver 44 and receiver application 46 similar to the module 22, receiver 24 and receiver application 26 described above. Further, the system 40 includes a mixer 48 and a training algorithm indicated schematically by the reference numeral 50.

FIG. 5 is a flow chart showing an algorithm, indicated generally by the reference numeral 60, in accordance with an example embodiment. As described in detail below, the algorithm 60 may be used for training the module 42 of the system 40.

The algorithm 60 starts at operation 62, w here the transmitter 2 and the receiver 44 of the transmission system 40 are initialised.

At operation 64 of the algorithm 60, the transmitter application 8 generates a set of A messages S={s_(i), i=1, . . . , N} and the transmitter 2 computes the corresponding output vectors x_(i) for each s_(i).

At operation 66, the vectors x_(i) are transmitted over the channel 4. The corresponding channel outputs are denoted by y_(i), i=1, . . . , N.

At operation 68, the receiver signal pre-processor (RSP) module 42 generates outputs y_(p,i) for all i (where y is a function of the signal pre-processor module 42 such that y_(p,i)=RSP(y_(i))) and the mixer 48 generates the outputs {tilde over (y)}_(p,i) for all i.

The mixer 48 generates the outputs {tilde over (y)}_(p,i) by adding a small perturbation w_(i), i=1, . . . , N, drawn from a known random distribution to the vector y_(p,i), such that {tilde over (y)}_(p,i)=y_(p,i)+w_(i).

At operation 70, the receiver 44 decodes the preprocessed channel outputs (i.e. the outputs of the mixer 48) into application messages

, i=1, . . . , N, and feeds the application messages to the receiver application 46. The receiver application computes a set of rewards r_(i), i=1, . . . , N.

At operation 72, the signal pre-processor module 42 is optimised, for example by updating trainable parameters (or weights) of the module neural networks (such as the neural networks 32 and 34 described above). The trainable parameters may be updated, for example, using a stochastic gradient descent (SGD) algorithm, by reducing the loss, L, in the objective function:

$L = {\frac{1}{N}{\sum\limits_{i = 1}^{N}{r_{i}{\log \left( {\overset{˜}{y}}_{p,i} \middle| y_{i} \right)}}}}$

The objective function, L, set out above is a function of which the gradient with respect to the trainable parameters θ of the signal pre-processor module 42 is computed. The function ∇_(θ)L is also known as the policy gradient.

The goal of the optimisation is to improve a chosen performance metric (the reward), thereby improving metrics such as block error rate (BLER), bit error rate (BER), error vector magnitude, mean squared error in estimate, categorical cross-entropy, etc. It should be noted that the reward r does not necessarily need to be differentiable.

The trainable parameters may take many different forms. For example, the batch size N, the learning rate, and other parameters of the chosen reinforcement learning algorithm (e.g. stochastic gradient descent (SGD) algorithms such as ADAM, RMSProp, Momentum) are possible optimisation parameters.

At operation 74, a determination is made regarding whether the algorithm 60 is complete. If the algorithm is deemed to be complete, then the algorithm terminates. If not, the algorithm returns to operation 62 and the operations 62 to 74 are repeated. The operation 74 may take many different forms. For example, the algorithm 70 may be deemed complete after a fixed number of training iterations, when the loss function L has not decreased during a fixed number of iterations, when a loss function meets a desired value, or a combination of such features. Other implementations of the operation 74 are also possible.

Training of the signal pre-processor module 42 may take place on demand. Alternatively, training may take place periodically (e.g. when a defined time has elapsed since training last took place). Many alternative arrangements are possible. For example, training may take place sporadically on an as-needed basis, for example in the event that performance of the signal pre-processor module 42 and/or the communication system 40 is deemed to have degraded (e.g. due to changes in channel or application requirements). Moreover, in some embodiments, the operation 74 may be omitted such that the operation 72 always loops back to the operation 62 (thereby implementing a permanent control loop, such that training of the system 40 never stops).

The training processes described herein encompass a number of variants. The use of reinforcement learning as described herein relies on exploring the policy space (i.e. the space of possible state to action mappings). As described herein, the policy is the mapping implemented by the RSP, the state space is the space of the received signal y_(p,i) and the action space is

^(n). Exploring can be done in numerous ways, two of the most popular approaches being:

-   -   Gaussian policy, in which a perturbation vector c is drawn from         a multivariate zero-mean normal distribution and added to the         current policy. This ensures exploration “in the neighbourhood”         of the current policy.     -   ε-greedy, in which with probability 1−ε, the token action is the         one of the policy, and with probability c a random action is         taken.

The covariance matrix of the normal distribution from which the perturbation vector c is drawn in the Gaussian policy, and the c parameter of the c-greedy approach, are usually fixed parameters, i.e., not learned during training. These parameters control the “amount of exploration”, as making these parameters smaller reduces the amount of random exploration, and favours actions from the current policy.

The system 40 described above can be used for training the signal pre-processor module 42. However, when not training the signal pre-processor module 42, no perturbation is added to the vector y_(p) and no reward feedback r is required.

FIG. 6 is a block diagram of an example end-to-end communication system, indicated generally by the reference numeral 80, in accordance with an example embodiment. The system 80 does not include the perturbation and reward feedback arrangement of the system 40 and so can be used following the training of the signal pre-processor.

The system 80 includes the transmitter 2, channel 4 and transmitter application 8 described above. The system 80 also includes a signal pre-processor module 82, receiver 84 and receiver application 86 similar to the modules 22 and 42, receivers 24 and 44 and receiver applications 26 and 46 described above.

FIG. 7 is a flow chart showing an algorithm, indicated generally by the reference numeral 90, in accordance with an example embodiment. As described in detail below, the algorithm 60 may be used for the use of the system 80.

The algorithm 90 starts at operation 92, where the transmitter application 8 generates a set of N messages S={s_(i), i=1, . . . , N} and the transmitter 2 computes the corresponding output vectors x_(i) for each s_(i).

At operation 94, the vectors x_(i) are transmitted over the channel 4. The corresponding channel outputs are denoted by y_(i), i=1, . . . , N.

At operation 96, the receiver signal pre-processor (RSP) module 82 generates outputs y_(p,i) for all i (where y is a function of the signal pre-processor module 82 such that y_(p,i)=RSP(y_(i))).

At operation 98, the receiver 84 decodes the pre-processed channel outputs (i.e. the outputs of the receiver signal pre-processor (RSP) module 82) into application messages

, i=1, . . . , N, and feeds the application messages to the receiver application 86.

There are number of potential applications of the principles described herein.

A first example application relates to the reconstruction of transmitted data in a lossy system. In this example, the goal of the transmitting application is to communicate sϵ

^(N) which are reconstructed by the receiving application. That is, the messages s are not drawn from the field of integers but from the field of real numbers. For instance, s could be a digital image and the goal of the receiver could be to construct a vector ŝϵ

^(N) as close as possible to s. In this case, the reward r could be the mean-square error (MSE): r=∥s−ŝ∥₂ ².

In a second example, the transmitting application sends a data vector sϵ

^(N), and the goal of the receiving application is to classify the transmitted vector into one out of M classes. For example, s could be an image and the receiver's goal could be to tell whether s contains a dog or a cat. The receiving application outputs a probability distribution over M classes p_(k), k=1, . . . , M. In this case, the reward r can be the categorical cross-entropy: r=−log p_(l(i)), where l is the function that gives for each training example i its true label l(i)ϵ{1, . . . , M}.

In a third example, working only on the transmitter-receiver pair, the principles described herein can be used to reduce the error rate of the transmitter-receiver pair without focusing on a specification application. Assuming a soft decision receiver that outputs a probability distribution over the set of messages p_(s), sϵ

, the categorical cross-entropy can be used as the reward: r=−log p_(l(i)), where mϵ

is the actual message sent by the transmitter.

For completeness, FIG. 8 is a schematic diagram of components of one or more of the modules described previously (e.g. signal pre-processor modules, mixers and systems as described above), which hereafter are referred to generically as processing systems 110. A processing system no may have a processor 112, a memory 114 closely coupled to the processor and comprised of a RAM 124 and ROM 122, and, optionally, hardware keys 120 and a display 128. The processing system no may comprise one or more network interfaces 118 for connection to a network, e.g. a modem which may be wired or wireless.

The processor 112 is connected to each of the other components in order to control operation thereof.

The memory 114 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 122 of the memory 114 stores, amongst other things, an operating system 125 and may store software applications 126. The RAM 124 of the memory 114 is used by the processor 112 for the temporary storage of data. The operating system 125 may contain code which, when executed by the processor, implements aspects of the algorithms 60 and 90.

The processor 112 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors.

The processing system no may be a standalone computer, a server, a console, or a network thereof.

In some embodiments, the processing system no may also be associated with external software applications. These may be applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system no may be in communication with the remote server device in order to utilize the software application stored there.

FIGS. 9a and 9b show tangible media, respectively a removable memory unit 165 and a compact disc (CD) 168, storing computer-readable code which when run by a computer may perform methods according to embodiments described above. The removable memory unit 165 may be a memory stick, e.g. a USB memory stick, having internal memory 166 storing the computer-readable code. The memory 166 may be accessed by a computer system via a connector 167. The CD 168 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagram of FIGS. 5 and 7 are examples only and that various operations depicted therein may be omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to cover any such features and/or combination of such features.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims. 

1. An apparatus comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the apparatus to perform, receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function.
 2. An apparatus as claimed in claim 1, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.
 3. An apparatus as claimed in claim 2, wherein the modifying the converted sequence of messages makes use of a distribution to generate the perturbations.
 4. An apparatus as claimed in claim 2, wherein the perturbations are zero-mean Gaussian perturbations.
 5. An apparatus as claimed in claim 1, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform generating the reward or loss function.
 6. An apparatus comprising: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the apparatus to perform, obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function.
 7. An apparatus as claimed in claim 6, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform, modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.
 8. An apparatus as claimed in claim 7, wherein the modifying the converted sequence of messages makes use of a distribution to generate the perturbations.
 9. An apparatus as claimed in claim 7, wherein the perturbations are zero-mean Gaussian perturbations.
 10. An apparatus as claimed in claim 6, wherein the reward or loss function is related to one or more of block error rate, bit error rate, error vector magnitude, mean square error in estimation and categorical cross-entropy.
 11. An apparatus as claimed in claim 6, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform repeating the training of the at least some weights of the correction algorithm until a first condition is reached.
 12. An apparatus as claimed in claim 11, wherein the first condition is a defined number of iterations and/or a defined performance level.
 13. An apparatus as claimed in claim 6, wherein the training further comprises optimising one or more of a batch size of the sequence of messages, a learning rate, and a distribution of perturbations.
 14. An apparatus as claimed in claim 6, wherein the training at least some weights of the correction algorithm comprises using reinforcement learning.
 15. An apparatus as claimed in claim 6, wherein the training at least some weights of the correction algorithm comprises using stochastic gradient descent.
 16. (canceled)
 17. A method comprising: receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function.
 18. A method as claimed in claim 17, further comprising: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.
 19. A method comprising: obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function.
 20. A method as claimed in claim 19, further comprising: modifying the converted sequence of messages to provide a modified sequence of messages based on a random perturbation of the converted sequence of messages; and providing the modified sequence of messages to the receiver of the transmission system, wherein the reward or loss function is based on the modified sequence of messages.
 21. A non-transitory computer readable medium storing a computer program comprising instructions, which when executed by a processor, cause an apparatus including the processor to perform the following: receiving a sequence of messages at a correction module of a transmission system, wherein the transmission system comprises a transmitter, a channel, the correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; receiving a reward or loss function from the receiver; and training at least some weights of the correction algorithm based on the received reward or loss function.
 22. A non-transitory computer readable medium storing a computer program comprising instructions, which when executed by a processor, cause an apparatus including the processor to perform the following: obtaining or generating a sequence of messages for transmission over a transmission system, wherein the transmission system comprises a transmitter, a channel, a correction module and a receiver, wherein the correction module includes a correction algorithm having at least some trainable weights; receiving the transmitted sequence of messages at the correction module; converting the received sequence of messages into a converted sequence of messages using the correction algorithm; generating a reward or loss function at the receiver; and training at least some weights of the correction algorithm based on the reward or loss function. 